A New HDFS Structure Model to Evaluate the Performance of Word Count Application on Different File Size
نویسندگان
چکیده
MapReduce is a powerful distributed processing model for large datasets. Hadoop is an open source framework and implementation of MapReduce. Hadoop distributed file system (HDFS) has become very popular to build large scale and high performance distributed data processing system. HDFS is designed mainly to handle big size files, so the processing of massive small files is a challenge in native HDFS. This paper focuses on introducing an approach to optimize the performance of processing of massive small files on HDFS. We design a new HDFS structure model which main idea is to merge the small files and write the small files at source direct into merged file. Experimental results show that the proposed scheme can improve the storage and access efficiencies of massive small files effectively on HDFS.
منابع مشابه
Generalization of Dynamic Two Stage Models in DEA: An Application in Saderat Bank
Dynamic network data envelopment analysis (DNDEA) has attracted a lot of attention in recent years. On one hand the available models in DNDEA evaluating the performance of a DMU with interrelated processes during specified multiple periods but on the other hand they can only measure the efficiency of dynamic network structure when a supply chain structure present. For example, in the banking in...
متن کاملDevelopment of a Non-Radial Network Model to Evaluate the Performance of a Multi-Stage Sustainable Supply Chain
Abstract:The purpose of this paper is to present a new model of non-radial data envelopment analysis that is able to evaluate the systems one of these complete networks is supply chain of cement industry. In this paper, using a non-radial model in data envelopment analysis, a model with a network structure that can assess the sustainable supply chain of strategic industries is evaluated....
متن کاملLive Website Traffic Analysis Integrated with Improved Performance for Small Files using Hadoop
Hadoop, an open source java framework deals with big data. It has HDFS (Hadoop distributed file system) and MapReduce. HDFS is designed to handle large amount files through clusters and suffers performance penalty while dealing with large number of small files. These large numbers of small files pose a heavy burden on the NameNode of HDFS and an increase execution time for MapReduce. Secondly, ...
متن کاملData - intensive file systems for Internet services : A rose by any other
Data-intensive distributed file systems are emerging as a key component of large scale Internet services and cloud computing platforms. They are designed from the ground up and are tuned for specific application workloads. Leading examples, such as the Google File System, Hadoop distributed file system (HDFS) and Amazon S3, are defining this new purpose-built paradigm. It is tempting to classif...
متن کاملData-intensive File Systems for Internet Services: A Rose by Any Other Name... (CMU-PDL-08-114)
Data-intensive distributed file systems are emerging as a key component of large scale Internet services and cloud computing platforms. They are designed from the ground up and are tuned for specific application workloads. Leading examples, such as the Google File System, Hadoop distributed file system (HDFS) and Amazon S3, are defining this new purpose-built paradigm. It is tempting to classif...
متن کامل